generation decoder
OmniVL: One Foundation Model for Image-Language
Our setup is based on the following considerations. The default settings for finetuning on each dataset are shown in Table 1. Table 1: End-to-end finetuning configurations for image-language downstream tasks.Config COCO (retrieval) & Flickr30k COCO (captioning) VQA optimizer AdamW AdamW AdamW base learning rate 1e-5 1e-5 2e-5 weight decay 0.05 0.05 0.05 learning rate schedule linear decay linear decay linear decay batch size 512 512 256 training epochs 10 10 10 C.2 Video-Language T asks We demonstrate more comparison results using different pretraining paradigms ( i.e., image-only, Details of the pretraining data can be found in Table 4. "img2vid" strategy is also adopted for further comparison, where we start with image-only pretraining We can see that the captions generated by OmniVL are both natural and abundant. OmniVL can generate more fine-grained descriptions (line 1). Figure 4: Some video captions generated by OmniVL.
ET5: A Novel End-to-end Framework for Conversational Machine Reading Comprehension
Zhang, Xiao, Huang, Heyan, Chi, Zewen, Mao, Xian-Ling
Conversational machine reading comprehension (CMRC) aims to assist computers to understand an natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. Existing methods typically require three steps: (1) decision making based on entailment reasoning; (2) span extraction if required by the above decision; (3) question rephrasing based on the extracted span. However, for nearly all these methods, the span extraction and question rephrasing steps cannot fully exploit the fine-grained entailment reasoning information in decision making step because of their relative independence, which will further enlarge the information gap between decision making and question phrasing. Thus, to tackle this problem, we propose a novel end-to-end framework for conversational machine reading comprehension based on shared parameter mechanism, called entailment reasoning T5 (ET5). Despite the lightweight of our proposed framework, experimental results show that the proposed ET5 achieves new state-of-the-art results on the ShARC leaderboard with the BLEU-4 score of 55.2. Our model and code are publicly available at https://github.com/Yottaxx/ET5.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Beijing > Beijing (0.05)
- Asia > India > West Bengal > Kolkata (0.04)
- (3 more...)